Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Adv Neurobiol ; 36: 557-570, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38468053

RESUMO

Brain tumor detection is crucial for clinical diagnosis and efficient therapy. In this work, we propose a hybrid approach for brain tumor classification based on both fractal geometry features and deep learning. In our proposed framework, we adopt the concept of fractal geometry to generate a "percolation" image with the aim of highlighting important spatial properties in brain images. Then both the original and the percolation images are provided as input to a convolutional neural network to detect the tumor. Extensive experiments, carried out on a well-known benchmark dataset, indicate that using percolation images can help the system perform better.


Assuntos
Neoplasias Encefálicas , Fractais , Humanos , Redes Neurais de Computação , Neoplasias Encefálicas/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Encéfalo/patologia
2.
Sensors (Basel) ; 23(10)2023 May 12.
Artigo em Inglês | MEDLINE | ID: mdl-37430601

RESUMO

In the realm of computer vision, semantic segmentation is the task of recognizing objects in images at the pixel level. This is done by performing a classification of each pixel. The task is complex and requires sophisticated skills and knowledge about the context to identify objects' boundaries. The importance of semantic segmentation in many domains is undisputed. In medical diagnostics, it simplifies the early detection of pathologies, thus mitigating the possible consequences. In this work, we provide a review of the literature on deep ensemble learning models for polyp segmentation and develop new ensembles based on convolutional neural networks and transformers. The development of an effective ensemble entails ensuring diversity between its components. To this end, we combined different models (HarDNet-MSEG, Polyp-PVT, and HSNet) trained with different data augmentation techniques, optimization methods, and learning rates, which we experimentally demonstrate to be useful to form a better ensemble. Most importantly, we introduce a new method to obtain the segmentation mask by averaging intermediate masks after the sigmoid layer. In our extensive experimental evaluation, the average performance of the proposed ensembles over five prominent datasets beat any other solution that we know of. Furthermore, the ensembles also performed better than the state-of-the-art on two of the five datasets, when individually considered, without having been specifically trained for them.


Assuntos
Fontes de Energia Elétrica , Conhecimento , Aprendizagem , Redes Neurais de Computação , Semântica
3.
J Imaging ; 9(2)2023 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-36826954

RESUMO

Skin detection involves identifying skin and non-skin areas in a digital image and is commonly used in various applications, such as analyzing hand gestures, tracking body parts, and facial recognition. The process of distinguishing between skin and non-skin regions in a digital image is widely used in a variety of applications, ranging from hand-gesture analysis to body-part tracking to facial recognition. Skin detection is a challenging problem that has received a lot of attention from experts and proposals from the research community in the context of intelligent systems, but the lack of common benchmarks and unified testing protocols has hampered fairness among approaches. Comparisons are very difficult. Recently, the success of deep neural networks has had a major impact on the field of image segmentation detection, resulting in various successful models to date. In this work, we survey the most recent research in this field and propose fair comparisons between approaches, using several different datasets. The main contributions of this work are (i) a comprehensive review of the literature on approaches to skin-color detection and a comparison of approaches that may help researchers and practitioners choose the best method for their application; (ii) a comprehensive list of datasets that report ground truth for skin detection; and (iii) a testing protocol for evaluating and comparing different skin-detection approaches. Moreover, we propose an ensemble of convolutional neural networks and transformers that obtains a state-of-the-art performance.

4.
J Imaging ; 7(12)2021 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-34940721

RESUMO

Convolutional neural networks (CNNs) have gained prominence in the research literature on image classification over the last decade. One shortcoming of CNNs, however, is their lack of generalizability and tendency to overfit when presented with small training sets. Augmentation directly confronts this problem by generating new data points providing additional information. In this paper, we investigate the performance of more than ten different sets of data augmentation methods, with two novel approaches proposed here: one based on the discrete wavelet transform and the other on the constant-Q Gabor transform. Pretrained ResNet50 networks are finetuned on each augmentation method. Combinations of these networks are evaluated and compared across four benchmark data sets of images representing diverse problems and collected by instruments that capture information at different scales: a virus data set, a bark data set, a portrait dataset, and a LIGO glitches data set. Experiments demonstrate the superiority of this approach. The best ensemble proposed in this work achieves state-of-the-art (or comparable) performance across all four data sets. This result shows that varying data augmentation is a feasible way for building an ensemble of classifiers for image classification.

5.
Sensors (Basel) ; 21(17)2021 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-34502700

RESUMO

In this paper, we examine two strategies for boosting the performance of ensembles of Siamese networks (SNNs) for image classification using two loss functions (Triplet and Binary Cross Entropy) and two methods for building the dissimilarity spaces (FULLY and DEEPER). With FULLY, the distance between a pattern and a prototype is calculated by comparing two images using the fully connected layer of the Siamese network. With DEEPER, each pattern is described using a deeper layer combined with dimensionality reduction. The basic design of the SNNs takes advantage of supervised k-means clustering for building the dissimilarity spaces that train a set of support vector machines, which are then combined by sum rule for a final decision. The robustness and versatility of this approach are demonstrated on several cross-domain image data sets, including a portrait data set, two bioimage and two animal vocalization data sets. Results show that the strategies employed in this work to increase the performance of dissimilarity image classification using SNN are closing the gap with standalone CNNs. Moreover, when our best system is combined with an ensemble of CNNs, the resulting performance is superior to an ensemble of CNNs, demonstrating that our new strategy is extracting additional information.


Assuntos
Redes Neurais de Computação , Animais
6.
Sensors (Basel) ; 21(5)2021 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-33668172

RESUMO

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system's performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.

7.
Sensors (Basel) ; 20(6)2020 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-32183334

RESUMO

In recent years, the field of deep learning has achieved considerable success in pattern recognition, image segmentation, and many other classification fields. There are many studies and practical applications of deep learning on images, video, or text classification. Activation functions play a crucial role in discriminative capabilities of the deep neural networks and the design of new "static" or "dynamic" activation functions is an active area of research. The main difference between "static" and "dynamic" functions is that the first class of activations considers all the neurons and layers as identical, while the second class learns parameters of the activation function independently for each layer or even each neuron. Although the "dynamic" activation functions perform better in some applications, the increased number of trainable parameters requires more computational time and can lead to overfitting. In this work, we propose a mixture of "static" and "dynamic" activation functions, which are stochastically selected at each layer. Our idea for model design is based on a method for changing some layers along the lines of different functional blocks of the best performing CNN models, with the aim of designing new models to be used as stand-alone networks or as a component of an ensemble. We propose to replace each activation layer of a CNN (usually a ReLU layer) by a different activation function stochastically drawn from a set of activation functions: in this way, the resulting CNN has a different set of activation function layers.

8.
Sensors (Basel) ; 19(23)2019 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-31795280

RESUMO

A fundamental problem in computer vision is face detection. In this paper, an experimentally derived ensemble made by a set of six face detectors is presented that maximizes the number of true positives while simultaneously reducing the number of false positives produced by the ensemble. False positives are removed using different filtering steps based primarily on the characteristics of the depth map related to the subwindows of the whole image that contain candidate faces. A new filtering approach based on processing the image with different wavelets is also proposed here. The experimental results show that the applied filtering steps used in our best ensemble reduce the number of false positives without decreasing the detection rate. This finding is validated on a combined dataset composed of four others for a total of 549 images, including 614 upright frontal faces acquired in unconstrained environments. The dataset provides both 2D and depth data. For further validation, the proposed ensemble is tested on the well-known BioID benchmark dataset, where it obtains a 100% detection rate with an acceptable number of false positives.

9.
Curr Pharm Des ; 24(34): 4007-4012, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30417778

RESUMO

BACKGROUND: Anatomical Therapeutic Chemical (ATC) classification of unknown compound has raised high significance for both drug development and basic research. The ATC system is a multi-label classification system proposed by the World Health Organization (WHO), which categorizes drugs into classes according to their therapeutic effects and characteristics. This system comprises five levels and includes several classes in each level; the first level includes 14 main overlapping classes. The ATC classification system simultaneously considers anatomical distribution, therapeutic effects, and chemical characteristics, the prediction for an unknown compound of its ATC classes is an essential problem, since such a prediction could be used to deduce not only a compound's possible active ingredients but also its therapeutic, pharmacological, and chemical properties. Nevertheless, the problem of automatic prediction is very challenging due to the high variability of the samples and the presence of overlapping among classes, resulting in multiple predictions and making machine learning extremely difficult. METHODS: In this paper, we propose a multi-label classifier system based on deep learned features to infer the ATC classification. The system is based on a 2D representation of the samples: first a 1D feature vector is obtained extracting information about a compound's chemical-chemical interaction and its structural and fingerprint similarities to other compounds belonging to the different ATC classes, then the original 1D feature vector is reshaped to obtain a 2D matrix representation of the compound. Finally, a convolutional neural network (CNN) is trained and used as a feature extractor. Two general purpose classifiers designed for multi-label classification are trained using the deep learned features and resulting scores are fused by the average rule. RESULTS: Experimental evaluation based on rigorous cross-validation demonstrates the superior prediction quality of this method compared to other state-of-the-art approaches developed for this problem. CONCLUSION: Extensive experiments demonstrate that the new predictor, based on CNN, outperforms other existing predictors in the literature in almost all the five metrics used to examine the performance for multi-label systems, particularly in the "absolute true" rate and the "absolute false" rate, the two most significant indexes. Matlab code will be available at https://github.com/LorisNanni.


Assuntos
Redes Neurais de Computação , Preparações Farmacêuticas/química , Preparações Farmacêuticas/classificação , Aprendizado Profundo , Humanos
10.
Artigo em Inglês | MEDLINE | ID: mdl-29994096

RESUMO

Bioimage classification is increasingly becoming more important in many biological studies including those that require accurate cell phenotype recognition, subcellular localization, and histopathological classification. In this paper, we present a new General Purpose (GenP) bioimage classification method that can be applied to a large range of classification problems. The GenP system we propose is an ensemble that combines multiple texture features (both handcrafted and learned descriptors) for superior and generalizable discriminative power. Our ensemble obtains a boosting of performance by combining local features, dense sampling features, and deep learning features. Each descriptor is used to train a different Support Vector Machine that is then combined by sum rule. We evaluate our method on a diverse set of bioimage classification tasks each represented by a benchmark database, including some of those available in the IICBU 2008 database. Each bioimage classification task represents a typical subcellular, cellular, and tissue level classification problem. Our evaluation on these datasets demonstrates that the proposed GenP bioimage ensemble obtains state-of-the-art performance without any ad-hoc dataset tuning of the parameters (thereby avoiding any risk of overfitting/overtraining). To reproduce the experiments reported in this paper, the MATLAB code of all the descriptors is available at https://github.com/LorisNanni and https://www.dropbox.com/s/bguw035yrqz0pwp/ElencoCode.docx?dl=0.

11.
J Neurosci Methods ; 302: 42-46, 2018 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-29104000

RESUMO

BACKGROUND: Alzheimer's disease (AD) is the most common cause of neurodegenerative dementia in the elderly population. Scientific research is very active in the challenge of designing automated approaches to achieve an early and certain diagnosis. Recently an international competition among AD predictors has been organized: "A Machine learning neuroimaging challenge for automated diagnosis of Mild Cognitive Impairment" (MLNeCh). This competition is based on pre-processed sets of T1-weighted Magnetic Resonance Images (MRI) to be classified in four categories: stable AD, individuals with MCI who converted to AD, individuals with MCI who did not convert to AD and healthy controls. NEW METHOD: In this work, we propose a method to perform early diagnosis of AD, which is evaluated on MLNeCh dataset. Since the automatic classification of AD is based on the use of feature vectors of high dimensionality, different techniques of feature selection/reduction are compared in order to avoid the curse-of-dimensionality problem, then the classification method is obtained as the combination of Support Vector Machines trained using different clusters of data extracted from the whole training set. RESULTS: The multi-classifier approach proposed in this work outperforms all the stand-alone method tested in our experiments. The final ensemble is based on a set of classifiers, each trained on a different cluster of the training data. The proposed ensemble has the great advantage of performing well using a very reduced version of the data (the reduction factor is more than 90%). The MATLAB code for the ensemble of classifiers will be publicly available1 to other researchers for future comparisons.


Assuntos
Doença de Alzheimer/classificação , Doença de Alzheimer/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Disfunção Cognitiva/classificação , Disfunção Cognitiva/diagnóstico por imagem , Imageamento por Ressonância Magnética , Máquina de Vetores de Suporte , Idoso , Bases de Dados Factuais , Diagnóstico Precoce , Feminino , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Masculino , Reconhecimento Automatizado de Padrão , Curva ROC
12.
Comput Biol Med ; 72: 239-47, 2016 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-26656952

RESUMO

In this paper, we propose a new method for improving the performance of 2D descriptors by building an n-layer image using different preprocessing approaches from which multilayer descriptors are extracted and used as feature vectors for training a Support Vector Machine. The different preprocessing approaches are used to build different n-layer images (n=3, n=5, etc.). We test both color and gray-level images, two well-known texture descriptors (Local Phase Quantization and Local Binary Pattern), and three of their variants suited for n-layer images (Volume Local Phase Quantization, Local Phase Quantization Three-Orthogonal-Planes, and Volume Local Binary Patterns). Our results show that multilayers and texture descriptors can be combined to outperform the standard single-layer approaches. Experiments on 10 datasets demonstrate the generalizability of the proposed descriptors. Most of these datasets are medical, but in each case the images are very different. Two datasets are completely unrelated to medicine and are included to demonstrate the discriminative power of the proposed descriptors across very different image recognition tasks. A MATLAB version of the complete system developed in this paper will be made available at https://www.dei.unipd.it/node/2357.


Assuntos
Diagnóstico por Imagem/classificação , Diagnóstico por Imagem/normas , Humanos , Máquina de Vetores de Suporte
13.
Comput Intell Neurosci ; 2015: 909123, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26413089

RESUMO

We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset.


Assuntos
Algoritmos , Inteligência Artificial , Reconhecimento Automatizado de Padrão , Simulação por Computador , Conjuntos de Dados como Assunto , Humanos , Aprendizagem , Distribuição Normal , Curva ROC
14.
ScientificWorldJournal ; 2014: 236717, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25028675

RESUMO

Many domains would benefit from reliable and efficient systems for automatic protein classification. An area of particular interest in recent studies on automatic protein classification is the exploration of new methods for extracting features from a protein that work well for specific problems. These methods, however, are not generalizable and have proven useful in only a few domains. Our goal is to evaluate several feature extraction approaches for representing proteins by testing them across multiple datasets. Different types of protein representations are evaluated: those starting from the position specific scoring matrix of the proteins (PSSM), those derived from the amino-acid sequence, two matrix representations, and features taken from the 3D tertiary structure of the protein. We also test new variants of proteins descriptors. We develop our system experimentally by comparing and combining different descriptors taken from the protein representations. Each descriptor is used to train a separate support vector machine (SVM), and the results are combined by sum rule. Some stand-alone descriptors work well on some datasets but not on others. Through fusion, the different descriptors provide a performance that works well across all tested datasets, in some cases performing better than the state-of-the-art.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Proteínas/classificação , Algoritmos , Proteínas/química
15.
J Theor Biol ; 360: 109-116, 2014 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-25026218

RESUMO

Successful protein structure identification enables researchers to estimate the biological functions of proteins, yet it remains a challenging problem. The most common method for determining an unknown protein's structural class is to perform expensive and time-consuming manual experiments. Because of the availability of amino acid sequences generated in the post-genomic age, it is possible to predict an unknown protein's structural class using machine learning methods given a protein's amino-acid sequence and/or its secondary structural elements. Following recent research in this area, we propose a new machine learning system that is based on combining several protein descriptors extracted from different protein representations, such as position specific scoring matrix (PSSM), the amino-acid sequence, and secondary structural sequences. The prediction engine of our system is operated by an ensemble of support vector machines (SVMs), where each SVM is trained on a different descriptor. The results of each SVM are combined by sum rule. Our final ensemble produces a success rate that is substantially better than previously reported results on three well-established datasets. The MATLAB code and datasets used in our experiments are freely available for future comparison at http://www.dei.unipd.it/node/2357.


Assuntos
Modelos Genéticos , Conformação Proteica , Proteínas/classificação , Proteínas/genética , Software , Algoritmos , Sequência de Aminoácidos , Inteligência Artificial , Máquina de Vetores de Suporte
16.
J Theor Biol ; 359: 120-8, 2014 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-24949993

RESUMO

The study of protein-drug interactions is a significant issue for drug development. Unfortunately, it is both expensive and time-consuming to perform physical experiments to determine whether a drug and a protein are interacting with each other. Some previous attempts to design an automated system to perform this task were based on the knowledge of the 3D structure of a protein, which is not always available in practice. With the availability of protein sequences generated in the post-genomic age, however, a sequence-based solution to deal with this problem is necessary. Following other works in this area, we propose a new machine learning system based on several protein descriptors extracted from several protein representations, such as, variants of the position specific scoring matrix (PSSM) of proteins, the amino-acid sequence, and a matrix representation of a protein. The prediction engine is operated by an ensemble of support vector machines (SVMs), with each SVM trained on a specific descriptor and the results of each SVM combined by sum rule. The overall success rate achieved by our final ensemble is notably higher than previous results obtained on the same datasets using the same testing protocols reported in the literature. MATLAB code and the datasets used in our experiments are freely available for future comparison at http://www.dei.unipd.it/node/2357.


Assuntos
Interações Medicamentosas , Redes e Vias Metabólicas , Preparações Farmacêuticas/metabolismo , Proteínas/química , Proteínas/metabolismo , Biologia Computacional , Humanos , Simulação de Acoplamento Molecular , Terapia de Alvo Molecular , Preparações Farmacêuticas/química , Ligação Proteica , Conformação Proteica , Relação Estrutura-Atividade
17.
Amino Acids ; 44(3): 887-901, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23108592

RESUMO

Many domains have a stake in the development of reliable systems for automatic protein classification. Of particular interest in recent studies of automatic protein classification is the exploration of new methods for extracting features from a protein that enhance classification for specific problems. These methods have proven very useful in one or two domains, but they have failed to generalize well across several domains (i.e. classification problems). In this paper, we evaluate several feature extraction approaches for representing proteins with the aim of sequence-based protein classification. Several protein representations are evaluated, those starting from: the position specific scoring matrix (PSSM) of the proteins; the amino-acid sequence; a matrix representation of the protein, of dimension (length of the protein) ×20, obtained using the substitution matrices for representing each amino-acid as a vector. A valuable result is that a texture descriptor can be extracted from the PSSM protein representation which improves the performance of standard descriptors based on the PSSM representation. Experimentally, we develop our systems by comparing several protein descriptors on nine different datasets. Each descriptor is used to train a support vector machine (SVM) or an ensemble of SVM. Although different stand-alone descriptors work well on some datasets (but not on others), we have discovered that fusion among classifiers trained using different descriptors obtains a good performance across all the tested datasets. Matlab code/Datasets used in the proposed paper are available at http://www.bias.csr.unibo.it\nanni\PSSM.rar.


Assuntos
Biologia Computacional/métodos , Proteínas/classificação , Análise de Sequência de Proteína/métodos , Animais , Bases de Dados de Proteínas , Humanos , Proteínas/química , Homologia de Sequência de Aminoácidos , Máquina de Vetores de Suporte
18.
Reprod Biomed Online ; 26(1): 42-9, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23177416

RESUMO

One of the most relevant aspects in assisted reproduction technology is the possibility of characterizing and identifying the most viable oocytes or embryos. In most cases, embryologists select them by visual examination and their evaluation is totally subjective. Recently, due to the rapid growth in the capacity to extract texture descriptors from a given image, a growing interest has been shown in the use of artificial intelligence methods for embryo or oocyte scoring/selection in IVF programmes. This work concentrates the efforts on the possible prediction of the quality of embryos and oocytes in order to improve the performance of assisted reproduction technology, starting from their images. The artificial intelligence system proposed in this work is based on a set of Levenberg-Marquardt neural networks trained using textural descriptors (the local binary patterns). The proposed system was tested on two data sets of 269 oocytes and 269 corresponding embryos from 104 women and compared with other machine learning methods already proposed in the past for similar classification problems. Although the results are only preliminary, they show an interesting classification performance. This technique may be of particular interest in those countries where legislation restricts embryo selection. One of the most relevant aspects in assisted reproduction technology is the possibility of characterizing and identifying the most viable oocytes or embryos. In most cases, embryologists select them by visual examination and their evaluation is totally subjective. Recently, due to the rapid growth in our capacity to extract texture descriptors from a given image, a growing interest has been shown in the use of artificial intelligence methods for embryo or oocyte scoring/selection in IVF programmes. In this work, we concentrate our efforts on the possible prediction of the quality of embryos and oocytes in order to improve the performance of assisted reproduction technology, starting from their images. The artificial intelligence system proposed in this work is based on a set of Levenberg-Marquardt neural networks trained using textural descriptors (the 'local binary patterns'). The proposed system is tested on two data sets, of 269 oocytes and 269 corresponding embryos from 104 women, and compared with other machine learning methods already proposed in the past for similar classification problems. Although the results are only preliminary, they showed an interesting classification performance. This technique may be of particular interest in those countries where legislation restricts embryo selection.


Assuntos
Inteligência Artificial , Blastocisto/classificação , Blastocisto/citologia , Processamento de Imagem Assistida por Computador/métodos , Oócitos/classificação , Oócitos/citologia , Técnicas de Apoio para a Decisão , Fertilização in vitro , Humanos , Redes Neurais de Computação
19.
Bioinformatics ; 28(8): 1151-7, 2012 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-22390939

RESUMO

MOTIVATION: The microarray report measures the expressions of tens of thousands of genes, producing a feature vector that is high in dimensionality and that contains much irrelevant information. This dimensionality degrades classification performance. Moreover, datasets typically contain few samples for training, leading to the 'curse of dimensionality' problem. It is essential, therefore, to find good methods for reducing the size of the feature set. RESULTS: In this article, we propose a method for gene microarray classification that combines different feature reduction approaches for improving classification performance. Using a support vector machine (SVM) as our classifier, we examine an SVM trained using a set of selected genes; an SVM trained using the feature set obtained by Neighborhood Preserving Embedding feature transform; a set of SVMs trained using a set of orthogonal wavelet coefficients of different wavelet mothers; a set of SVMs trained using texture descriptors extracted from the microarray, considering it as an image; and an ensemble that combines the best feature extraction methods listed above. The positive results reported offer confirmation that combining different features extraction methods greatly enhances system performance. The experiments were performed using several different datasets, and our results [expressed as both accuracy and area under the receiver operating characteristic (ROC) curve] show the goodness of the proposed approach with respect to the state of the art. AVAILABILITY: The MATHLAB code of the proposed approach is publicly available at bias.csr.unibo.it/nanni/micro.rar.


Assuntos
Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos , Máquina de Vetores de Suporte , Área Sob a Curva , Humanos
20.
Amino Acids ; 43(2): 657-65, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21993538

RESUMO

The last decade has seen an explosion in the collection of protein data. To actualize the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offer many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou's pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods. The Matlab code of the proposed protein descriptors is available at http://bias.csr.unibo.it/nanni/wave.rar .


Assuntos
Proteínas/classificação , Análise de Sequência de Proteína , Análise de Ondaletas , Sequência de Aminoácidos , Área Sob a Curva , Análise de Fourier , Modelos Químicos , Redes Neurais de Computação , Proteínas/química , Curva ROC , Estatísticas não Paramétricas , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...